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Abstract — We consider the 2" channels synthesized by the 71- 
fold application of Ankan's polar transform to a binary erasure 
channel (BEC). The synthetic channels are BECs themselves, and 
we show that, asymptotically for almost all these channels, the 
pairwise correlations between their erasure events are extremely 
small: the correlation coefficients vanish faster than any exponen- 
tial in n. Such a fast decay of correlations allows us to conclude 
that the union bound on the block error probability of polar 
codes is very tight. 

I. Introduction 

Channel Polarization is a technique recently introduced by 
Ankan fT) as a means of constructing capacity achieving 
codes for binary discrete memoryless channels (B-DMCs). The 
underlying principle of channel polarization is the following: 
Let W : X — > J be a B-DMC with input alphabet 
X = ¥2- From two independent copies of W synthesize 
W~ : X — ► y 2 and W + : X — > y 2 x X as: 
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independent copies of E and creating the erasure indicators of 
W~ and W+. 

Lemma 1 (Polar Transform of BEC [1. Proposition 6]). If 
W is a BEC with erasure probability e, applying the polar 
transform (W, W) H> (W~ ,W + ) produces two BECs W + 
with erasure probability e 2 and W~ with erasure probability 
2e — e 2 . Moreover, W~ erases iff either copy of W erases, 
and W + erases iff both copies of W erase. 

Corollary 1. The erasure indicators ofW~ and W + , denoted 
by E~ and E + are constructed from two independent copies 
of E, denoted by E and E' , as: 



E~ 
E+ 



max{E, E'} 
min{i;, E'} = 



--E + E' 
EE'. 



EE' 



(la) 
(lb) 



While two copies of E are independent (and hence un- 
corrected), E + and E~ are correlated: E + = 1 implies 



W-(y 1 ,y 2 \u 1 ) = -W{y 1 \u 1 +u 2 )W(y 2 \u 2 ), E = I. On the other side, by polarization W n s) s (and 



u 2 ex 

W + (y 1 ,y 2 ,u 1 \u 2 ) = -W(yi\ui + u 2 )W {y 2 \u 2 ) ■ 

As the superscripts suggest W~ turns out to be a B-DMC 
worse than W while W + is a better B-DMC compared to 
W. This transform can be repeated n times to get N = 2" 
B-DMCs Wi e \s e {-,+}". Arikan shows that (i) the 

(s) 

transformation preserves the mutual information, (ii) W„ s 
approach to "extremal" channels, i.e., either noiseless or 
useless channels. In particular, the fraction of almost noiseless 
channels is equal to the symmetric capacity of the original B- 
DMC W. Based on these properties Arikan constructs polar 
codes by sending uncoded data bits only on (almost) noiseless 
channels and arbitrary (but known to receiver) bits on the 
remaining channels. The channels used to transmit information 
are referred to as "information" channels and the rest are 
called "frozen" channels. A successive cancellation decoder 
has been proposed by Arikan to decode the information bits 
with complexity O (N log N) and shown to have a block error 
probability that behaves roughly as 0(2~^) (cf. 1121). 

The set of Binary Erasure Channels (BECs) is stable under 
Polarization in the sense that if W is a BEC, then W + and 
W~ are also BECs. We denote a BEC with erasure probability 
e as BEC(e). Observe that one can establish a one-to-one 
relationship between a BEC (e) and an "erasure indicator" 
random variable E such that E G {0, 1} and P [E = 1] = e. 
The polar transform of a BEC is hence equivalent to taking two 



(s) 

equivalently En 's) become deterministic as n — > 00. Hence 
it looks like E„ and E„ would become uncorrected for 
s ^ t where s and t are sign sequences of length n used 
for indexing the channels. In particular it is easy to see that 
E[E n s) E n t} ] -E[E n s) ]E[E n t} ] is small for almost every s,t. 

In this paper we provide upper bounds on correlation 
coefficients defined as: 



,(s,t) 



(t)l 



varf^Hvar^r] 



(2) 



and exploit these bounds and the inclusion-exclusion principle 
to find lower bounds on the block error probability of polar 
codes. In particular, our bounds are strong enough to show that 
the sum of the Bhattacharryya parameters of the information 
channels is a tight estimate of the block error probability. 

II. Notation 

Throughout this note, we use uppercase letters (like X) to 
indicate a random variable, and its lowercase version (x) for a 
realization of that random variable. The boldface letters denote 
matrices, vectors or sequences which will be clear from the 
context. 

We denote the sets by script-style uppercase letters like S 
and by |<S| we mean the cardinality of S. 

We often use the bar notation defined as x = 1 — x for the 
sake of brevity. We refer to x as the "complement" of x. 



For sign sequences s E { — , +}* and t E { — , +}*, CP [s,t] 
denotes their common prefix. Furthermore, let |s| denote the 
length of a sequence s. 

III. Properties of Correlation Coefficients 

As we mentioned in Section|Il we are interested in analyzing 
the correlation coefficients matrix of the erasure indicator vec- 
tor E„ = [En] . It is more convenient to index the N = 2" 
elements of that vector using sign sequences s E { — ,+}" 
instead of mapping the sign sequences to integers and using 
the natural indexing. We will use the same indexing for the 
N 2 elements of the correlation coefficients matrix. 

Arikan has already shown that the vector Z n = E[E„] 
can be computed via a single-step recursion. More precisely, 
having Z„_i we can compute the elements of Z„ as: 

2 



2 



(3a) 
(3b) 



z (s-) =2Z (s) 7 ,M 

for Vs E {-, with Z Q = e. Note that d3aj and OB can 

also be derived by taking the expectation from both sides of 
([Tal l and (TTbl and using the independence between E and E' . 

Interestingly, the correlation coefficients matrix p n = 
[pn \ can also be computed via a single-step recursion as 
we see in this section. 

It is useful to rewrite ([Tat and ([TBI as 



E~ 

E^ 



E x E> 
E x E' 



and subsequently (l3at and Obi as: 



7(3-) _ 7(3) 

Z (B+) _ Z {b) 2 



(4a) 
(4b) 



(5a) 
(5b) 



to see the symmetry between 'minus' and 'plus' transforms. 

Recall that the "covariance" of random variables X and Y 
be defined as: 



cov [X, Y]=E [XY] — E[X]E [Y] 



(6) 



Lemma 2. Let X and Y be arbitrary random variables and 
set U = X and V = Y. Then: 



var [U] — var [X] 



(7) 



Moreover, 



cov [U, V] = cov [X, Y] (8a) 
cov [X, V] = cov [U, Y] = -cov [X, Y] (8b) 

Proof: It is clear that E [U] = 1 - E [X] and E [V] = 
1 — E [y]. (0 is also trivial since var [aX + b] = \a\ 2 var [X] 
for any constants a and b. Furthermore: 

e[uv] =E[(i-x)(i-y)] = i-E[x]-E[y]+E[xy] 

hence cov [U, V] = E [UV] — E[U]E [V] = E [XY] - 
E [X] E [Y] = cov [X, Y] which proves d8a]i. Likewise, 



E\UY] 



, [(1 - X)Y] = E [Y] - E [XY] 



which shows cov [U, Y]=E [UY]-E [U] E [U] = -E [XY] + 
E [X] E [y] = — cov [X, y]. The same argument applies to 
cov [X.V] which proves 



Corollary 2. Let X, Y, U and V be defined as in Lemma \2\ 
and p[X, Y] = cov[x,Y] denote the correlation coefficient 

A/var[X]var[F] 

between random variables X and Y, then: 

p[U,V]=p[X,Y] (9a) 
p[X,V]=p[U,Y] = -p[X,Y] (9b) 

Lemma 3. The covariance matrix of the random vector E„, 
C„ 4 [c£ s,t) ] where 

C^^cov^),^)], 

can be computed in terms o/C„_i and Z„_i as follows: 



As) 7 {t) r,{s,t) 



rKH-wW 7M r»( s >*) 

°n — ZZ/ n-l Z 'ri-l Ly n-l 

✓~<(s+.t+) _ 07 (s) 7 (t) ^(s.t) ^(s.t) 
°n — ZZ/ n-l Z 'n-l ri-l + 

It is clear that Cq — ee where e is the erasure probability of 
the underlying BEC. 



U n-1 ' 


(10a) 


r (s,t) 2 
°n-l ! 


(10b) 


°n-l ! 


(10c) 


°n-l ■ 


(10d) 



Proof: We first prove (llOdl i and then show how the rest 
of results easily follow using Lemma [2] 

Recall that E n s+) = E^ x E^ and E ( n t+) = E n % x 



E^_i Furthermore, E[E^\] = and E^iJ 



(s) 



r (t) 



7 (t) . 



^ s +),i#+)]=E 



£ W B « ' 

- c 'n-l- c 'n-l -^n-l^n-l 



= (e^eWj - zw^) 8 

_ ^(s,*) 2 , n 7 (s) 7 (t) r (8,t) 
— u n-l T zz, n Vl • 

Note that in (*) we have used the independence between the 
indicator variables with prime and the ones without that and 
the fact that they are both identical copies of the same random 
variable. 
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Now observe that En 



4-1 x E n s \' and E^ 



P (t) v p (t) ' 
To compute C, 



(s-,t-) 



using ( I8at we have: 



cov [4 s "), 4*")] =cov 



„(s) p (s) ' p (t) „(t) ' 



COV 



p (s) „(s) ' p (t) p (t) ' 



W r (s,t) 2 97 (s) 7 (t) r (s,t) 



where (*) follows by observing that we are essentially com- 
puting the same covariance as the one we just computed to 

show ( llOdl ) considering the facts that (i) cov 
cxsv\E^\, (using ( f8ab once again) and (ii) E 



(s) 
n-l 



and E 



(t) 
n-l 



Z 



(t) 



Likewise (llObb (similarly ( llOcl i) follows using d8bl : 

cov [£<->, =cov 



p(») x P W ' p(t) x p(t) ' 



-COV 



p( s ) v pw v pVN 

2 



r(t) 



n(t) 



— l u n-l M it-rn-l u n-lj- 

Once again in (*) we are computing the same form of 
covariance as the one we did to show ( I lOdb considering the 



fact that cov 
(by §0). 



P (s) p(t) 
"n-li ^n-1 



cov^i,^] 



-C 



(s,t) 



Corollary 3. Correlation coefficients matrix of the random 
vector E„, defined as p n = pn (where pn'^ is defined 
in ©J can fee computed in terms of p n -i and Z„_i as: 



,(s-t-) 



7 (s) 

v n-l 



z 



(t) 

n-l (s,t) 
Pn-l 



Z 



7^ 2 
^n-1 (s,t) 2 

-Pn-l 



(11a) 



p (»-,tH 
rn 



7 (s) 
v n-l 



z 



(t) 

n-l (s,t) 
Pn-l 



Z, 



z 



(t) 

n-l (s,t)" 

- — Pn-l 



(lib) 



= 2, 



7 (s) 

J n-1 



7 (t) 

Z n-1 (s.t) 
--Pn-l 



7 W 
J n-1 



7 (t) 

^n-1 (s.t)" 
=Pn-l 



(lie) 



„(s+,t+) 
rn 



= 2 A 



7 (s) 
v n-l 



7 (t) 

^n-1 (s.t) 
Pn-l 



Z 



(s) 



\ i + zil\ i + e 



Z n-1 (s,t) 2 
"Pn-l 



Clearly po = 1- 



Proof: Once again we only prove (11 ldb and the rest follow 
by the symmetry using Corollary [2] Since £„ s) s are {0, 1} 



valued RVs with mean Z\ 



(s). 



va.r 



7 s 7 



(12) 



Setting C„ s,t) = p [ n' t] \j Z n s) Z n s) Z n t] in both sides of 



( I lOdb and using the fact that Z. 



(B+) 



= z 



(s) 



(similarly 



,(t+) _ 7 (t) 



% ) we get: 



97OO 7 (t) J 7 (s) 7 (s) 7 (t) 7 (t) (s,t) 
^n-rn-lV ^n-l^n-l^n-l^n-lPn-1 



, r 7 (») 7 (s) 7 (t) 7 (t) ^(s,t) z 
+ (^n-l^n-l^n-l^n-lJPn-1 

Eliminating Z^^Z^^ from both sides and observing that 
—7— and . x 5 = 1 / tt — proves the claim. ■ 
The property of being computable by a single-step recursion 
generalizes to higher order statistics: 

Lemma 4. In general the m-th order moments of the random 
variables E n s \s n G {— , +}" can be computed from the 

m-th order moments of random variables E n _ t ,s n_1 € 

\ ■ • r '• 



Proof: By the m-th order moment we mean: 

E 



p( S l) p( S 2 ) . . . p( s " 



for some set of indices s" 1 , sj? 1 , 
necessarily distinct. 



' a r, 



which are not 



Let s" denote the subsequence of s" including its first n- 



1 elements and observe that for any k € {1, 2, . . . , m}, i% 

(s"~ 1% ) (s" _1 ) 

is linear in each of E^\ ' and ' (cf. (0 and ( TTbl i) 



^ and £„-\ ^' 
This means in the expansion of E n e[ s ^ 
have the terms in the form of E 



n-l 



En" 1 " 1 we will 



(s n - 1 'Y (s"- 1 ')' is"- 1 ')' 

E\\ ' E^ x ' ■■■ E^l\ ' for some I < m and V < m. 

The independence of the variables with prime and the one 
without prime implies that the expectation of such product will 
be product of two expectations each of which is at most an 



m-th order moment of the random variables E, 



n-l 



One can derive the properties stated in the sequel on p n 
according to the aforementioned recursions: 



(B,t) 



Property 1. 



< pi s ' t] < mil 



7W7W 

An ^n 



\ 7 (s) 7 (t)'\ 7 (s) 7 (t) [ 
\ Z/n A n \ A n A n 



7 (s) 7 (t) 
A n A n 



(13) 



(lid) Property [T] follows as a corollary of the following property 



on C„ ■ 
Property. 



< < min{z^Z«, Z^Z n t] } (14) 



Proof: We prove the claim by induction on n. The claim 
is trivially true for n — since: 

< Co — var [Eq] = ee < minjee, ee} 

where e is the erasure probability of the underlying BEC. 
Now, assuming ( fT4b holds for n — 1, we shall show: 



< C<-'*-> < minjz^Z^-), Zr ) ^F 1 }. (15a) 

< C(°-> t+) < min{ W ] Z^ , Z^zF 1 } . (15b) 

< G(-+'*-J < min{ ZF 7 ^-) , Z^zF 7 } . (15c) 

< C ( S +,t+) < j r z ^+) z ( t +) Z ( S +) z (t+)\ (15d) 

- n - I ™ ' n J Property 2. For s,t e {-,+} ant/ s„, t n e {—, +} 



- If ab < ab, then the LHS of our inequality at c = ab 
will be equal to: 

lab x ab — (afe) = a6 [2a6 — ab\ 

= ab\ab + a + b — 1] 
= a6[(l + o)(l + 6)-2] 
< a6(l + a)(l + 6) 

Furthermore, as the LHS is increasing in c, at c = ab 
it will be less than (ab) 2 (its value at c = ab). ■ 

Remark. This upper-bound shows for almost all choices of s 
and t, Ct t] = E[E { n s] E { n t} ] - EfE^jE^] goes to zero 
as n gets large. 



As dlOdb is obtained by replacing both Z„ and Z n B ^ by 
their complements and dlOcb is obtained by swapping s and t 
in (llObb we only need to prove (I15ab and (1 1 5bb and the rest 

(s— t— ) 

follow by symmetry. Furthermore, positivity of Cn ' and 
Cn ' t+ ^ is clear by the assumption (fT~4T > (for n — 1) and the 



combination formulae (11 Oat and ( llObb . So, we only verify the 
upper-bounds. 

Let a = Z^l v b = Z r ( l t 2 1 and c = C^}, for the sake of 
brevity. Note that by definition < a < 1 and < b < 1. 
However, if either a or 6 is extremal, by assumption (1141 1. 
c = and the claim is trivial. So, for the rest of the proof, we 
safely assume < a < 1 and < b < 1. 

• To prove ( I15ab we have to show: 

2abc + c 2 < min{a 2 (26 - b 2 ), (2a - a 2 )& 2 }. 

The above inequality is symmetric in a and b hence 
without loss of generality we can assume a > b which 
implies ba > ab and also (2a — a 2 )b > a 2 (2b—b 2 ). The 
LHS of the above inequality is increasing in c, hence once 
we verify the inequality for maximum possible value of 
c we are done. Replacing c with ab we get: 

2abab+(ab) 2 < (a) 2 (2b-b 2 ). 

Simplifying a 2 from both sides yields 2b — b 2 < 2b — b 2 . 
» To prove (1 1 5bb we need to show: 

2a6c - c 2 < min{a 2 6 2 , (2a - a 2 )(l - b 2 )} 
= min{(a6) 2 ,a6(l + a)(l + 6)} 

As c < ab the LHS is an increasing function of c and we 
only need to verify the inequality for maximum possible 
value of c. 

- If ab < ab, the LHS of the inequality will be (ab) 2 
at c = ab and: 

(ab) 2 < (ab) x (ab) < [(1 + a)(l + b)} x [ab\ 



(ss„,tt„) < (s,t) 
Pn — Pn-1 

with equality iff 
(°) Pn-l = °. or 

(bj s n = t n and p£*{ = 1 and = Z^]_ x , 



(c) Z^_i = b Sn and Z, 



(t) 



bt , where 6_| 



or 

1 and b- 



Proof: The case of p„^l 



is trivial. Otherwise, we 



consider the ratio /0n SS "' t *"V/ o n-i ■ Using (111 al > to ( 111 db this 



Z ( nh and 



ratio is as shown in d 1 6b . Let a 
r = Pn-l and observe that: 
(i) If (s n ,t n ) = (+,+), applying the Cauchy-Schwarz 
inequality to the RHS of ( [ToT l we get: 




For a e [0, 1], b E [0, 1] and r G [0, 1], each of the 
square-roots are strictly smaller than 1 unless r = 1 Q 
or a = b = 1. Furthermore, the equality con ditio ns for 

Cauchy-Schwarz inequality imply \J a/ a = \J b/b which 
in turn implies a = b. Therefore, we can conclude that if 
(s n ,t n ) = (+,+), p ( *lr ttn) /p ( *-l < 1 with equality iff 
(Z^ = Z V X and = 1) or (Z^ = Z« x = 1). 
The same argument can also be applied to the case of 

( s nj tn) ( — j )■ 

(ii) If (s n ,t n ) = (+,-), the RHS of (O can be bounded 
as: 




' l+x+(r~l)x 
l + x 

one since the numerator is less than the denominator. 



As each of them is in the form of 



which is smaller than 



(ss n ,tt n 



Pn 



(s,t) 
Pn-l 



A*) 



= < 







Z L B \ 


/ z (t) 


i+z (s) , V 

' ra — 1 y 


"1" Tl— 1 




/ 


1+^ V 






/ 41 


1+4^! V 


1+4-1 



+ 



i+4i V 



(M) 

Pn-l 



if (Sn,*n) = (+,+), 
if (Sn,*n) = (+,-), 



if (Sn,t n ) = (-,-)■ 



(16) 



The last inequality follows by observing that 

^j 2 ^ < ^ for a; G [0, 1] with equality iff 1 = 1. 
Furthermore, it is easy to see that the equality in 
all obove chain of weak inequalities happens iff 
(a, b) — (1,00 By symmetry, this argument also 
applies to the case of (s n , t n ) = (— , +). ■ 



Property 3. If s ^= t then p„ 



0,t) 



< 



Proof: Let p = CP [s, t] be the common prefix of s and 
t and rn = |p| its length. Then s m +i ^ i m +i and Propeorty|2] 
together with either (11 lbb or (11 lc) result in: 

(s,t) < ( PSm +i 

rn — ri 



Ptm+l) 



J m+1 




7 (p) 7 (p) 3 



,(P) 



with equality iff Z„ 

IV. Convergence of Correlation Coefficients 



In the previous section we showed how correlation coeffi- 
cients can be computed efficiently by single-step recursions 
and derived some algebraic properties of them. In this section 
we show that correlation coefficients converge to zero. 

Lemma 5. Let s and t be infinite sign sequences such that 
s ^ t and s" and t™ be the subsequences corresponding to 
their first n elements respectively. Then lim„_ i . 00 p n ' ' = 0. 

Proof: Let m = | CP [s, t] | and a n = pn '* ■ For n > m, 
by Properties [TJ and |3] we know a n € [0, 1/3] and by Property [2] 
it is decreasing. Hence, a n is a convergent sequence. Suppose 
its limit is a* > 0. This implies for every e > there exist a 
no such that for n > no, a n /a n -\ > 1 — e. By the continuity 



of ( fT6l l, we must have \Z^ l _ 1 ' 



< 8 and |Z 



&t„+i I < 5 for all n > no according to equality conditions 
of Property [2] where 5 is a quantity approaching zero as e 
gets small. This implies s n — s* and t„ = i* for all n > no 

2 By Property [T] this condition implies = 0. 



because the evolutions of Z do not allow Z to jump from 
one extreme to the other. Without loss of generality, assume 

s* = + which in turn requires Z n _ x > 1 — 8. Now we 
have an incompatible situation: s n = + for all n > no will 
drive Z„ to 0. This shows a n cannot converge to a non-zero 
value. ■ 
Additionally we can show that the average of the elements 
of the correlation coefficients matrix is exponentially small in 
n. 

Lemma 6. For any s, t € { — j+} n_1 , 

1 (■»,«) < 2 (s,t) 



(*,*)e{-,+} 

.(b) 



Proo/- Let a = Z^, 6 = Z® v f(x) = ^ 



, and g(a;) = | 



Using ( 11 lab to (llldb 



one can easily verify that: 

J E ^ = /(«)/( 6 )Pn-i + 9(a) 9 (b)ptf 



( s ,t)e{-,+y 



f(a)f(b) + g(a)g(P)p^l 



(s.t) 
Pn-l- 



Now, observe that both sides of the above are positive and: 

2 



f{a)f(Jb)+g(a)g(b) P ™ 



n 
< 

< 



'fiaf + pt'Uaf] [f{bf + p { : x) l9 {bf 
f{af+g{af] [f (6) 2 + g{bf] 



where (*) follows from Cauchy-Schw arz inequalit y. It is easy 
to see f{xf + g(x) 2 = | (l + y/ j which is 

maximized at a; = | (for a; € [0, 1]) with value |. ■ 

Corollary 4. 77ie average of the normalized correlation matrix 
elements satisfies: 



1 

4" 



E 

■,te{-,+p 



Proof: The result follows by applying Lemma |6] n times 
and observing that po = !■ ■ 



V. Rate of Convergence 

Corollary |4] implies that for large enough n, almost all of 
non-diagonal entries of p n are small. However, the bound it 
gives is not strong enough to show the asymptotic tightness of 
the union bound on the block error probability of polar codes. 
For that, one has to show (i) that the correlations decay like 
0(2~( 1+a '") for some a > 0, and (ii) that this bound applies 

not just to the average value of pn'^ but to maxt^ s Pn f° r 
the s's and t's which index the information channels. 

To this end, we establish a probabilistic framework simi- 
lar to that used in 0] for proving the channel polarization 
theorem. 

Let Si,i$2, .-.) be i.i.d Bernoulli (h) random variables 
such that S t € {-,+}, define S" = (Si, S 2 , . . . , S n ) and 
J- n = cr(S n ) as the u-algebra generated by random vector 
S™. We consider the random variables Z) x = M,[E n \ and 

Pn 



■ S ' ; for t" e { — ,+}" which are all T n measurable. 



(S n ,t" 



We show that for any a > 0, maxt^^s™ P 
2-(i+a)n w j t j 1 ver y jjjgjj probability for sufficiently large 



< 



A. Closely related s and t 

(s t) 

Let us first focus on p n ' for s and t sharing a long 
common prefix. Recall that |CP [s, t] | denotes the length of 
this prefix. 

Lemma 7. Fix a > 0. Set m n = 4 log(2(l + a)n — l). Then: 



lim ] 

n— >oc 



max 

t"^S":|CP[S",t" 



9 (S",t") < 2 -(l+o)n 



\>m„ 



Proof: Let P = CP[S",t"] and n = |P|. Observe 
that P is a uniformly chosen sign sequence in {— ,+}"°. 
According to Property |2] p„„ ' ' =1 and: 



(S",t") , (PS„ +i,Pt„ 0+ i 



Results of show that for any fixed < /3 < 1/2 and 
<5 > there exist a mo such that for no > mo 




z! p )ef2-^,i_2- 



< 6 



where 7V = 2 n °. 

In particular we take f3 

n large enough so that m n > ttiq. Hence tiq > m n > mo, 

/p\ 

and with probability at least 1 — 6, Z no ' is extremal. Together 



j in the above bound and take 



with 2 



-N, 



i/4 



< 2- 2 ( 1+Q )"+ 1 we get 

(S",t") < 9 -(l+a)n 
Pn — z 



> 1 - 6. 



B. Distantly related s and t 

(s t) 

A more involved task is find and upper-bound on p„ ' when 
s and t do not have long a common prefix. For this purpose 

we first seek an upper-bound on p£ S :t Vi°!-i '* ^ only 



in terms of S™" 1 , S„ and p n = |CP[S",t"]|, denoted as 

x(s n -\s n , Pn ). 

To this end, let: 



A I 



fa , I s " 



(s-St- 1 ) (S ) (t) \ A _^ 



(S",t") 



Pn-l 



M (s, t, r, a, b) takes four possible forms according to (fT&t . 
each of which can be bounded as: 

M (+, t, r, a, b) < min 

M (— , t, r, a, b) < min 
using Lemma [8] (and triangle inequality if s ^ t): 
Lemma 8. Let f(x) = ^Jjf^: an d g{x) = yjj^- Define 

F(r,a,b)±2f(a)f(b) + g(a)g(b)r. 

Then 

F(r, a, b) < min |l,V2o + r| , (17) 

for all < r < 1, < a < 1, < b < 1. 

Proof: Observe that F(r, a, b) > by construction and: 

F(r,a,b) 2 = (2f(a)f(b)+g(a)g(b)r) 2 

r < (2f(a)f(b)+g(a)g(b)f 

< (2f{af+g{af) (2f{bf+g{bf) 

where (*) follows by Cauchy-Schwarz inequality. Further- 
more, 2f(x) 2 + g(x) 2 — + = 1 which proves 
F(r,a,b) < 1. 

It is also easy to verify f(x) < and g(x) < 1 for 
V.t € [0, 1]. Hence: 

F(r, a, b) < V2f(a) + r < V2a + r 



where the last inequality follows by observing that v 



< 



Jx since x > 0. 

Observe that the upper-bounds on M depend only Z\_ x 

(s" _1 .t" _1 ) 
and p n _i ■ Let us also define 

(s™ *) A (s",t") 

Pnv = max Pn ■ 

Consequently we may choose: 



X (S"-\ +,p n ) = min \ 1, pzf\ + p^'p* 



^\i 1 \j2zf\+ P tz: 



Now we would like to show that min s?i \ (S 1 



(18a) 
(18b) 

5 Sn 7 Pn ) 



gets arbitrarily small with very high probability. For this, we 
first need the following lemma: 

Lemma 9. For any sequence p n such that lim n _ i . 00 j — p n = 
oo and any fixed 7 > 0, 



lim 



n (s\*) 

vz > - : pl P ; 1 < 7 



= 1. 



(19) 



Proof: Observe that for fixed p, pl p is decreasing in 

i (if i > p). Hence p n ^ 2 p < 7 implies p\ Pn < 7 for all 
i > n/2. 

Suppose s is a sequence such that for some t / s with 

( „/2 t n/2\ 

|CP[s,t]| < p n , p\ /2 ' ' > 7. Recall that s l (resp. t l ) 
denotes the subsequence of s (resp. t) including its first i 
elements. 

Define <Zj — and mi = ai/ai-%. It is clear that 

a p „+i < I and a, is decreasing for i > p„ by Properties [3] 
and |2 

For any < e < 1, a„/ 2 > 7 implies that number of indices 
i € {p„ + 2, p„ + 3, . . . , ^} for which m; < 1 — £ is at most 

log(37) 
log(l-s)- 

Let I = j — p n — 1, take e = l/Vl, and observe that the 
number of indices for which rrii < 1 — 1/Vl is at most 



log(37) 



< 



- log(37) 



= c 7 Vt, 



io g (i - 1/V7) " 

where c 7 is a constant that depends on 7 only. These indices 
partition the interval [p n + 2 : ^] into at most c 1 \fl segments, 
one of those must have a length at least c~ x \fl. Let us only 
consider this "long" segment: 

The fact that rrii > 1— l/\ft on this segment implies the sign 
sequence s p „+2, • • • , s n /2 must be constant on this segment 
(cf. Proof of Lemma[5]l. The set of sequences of length I which 
have a run of the same sign for an interval of length c~ x \l 

has probability at most 21 ■ 2~ c i ^ . However, by assumption 
I = t£ — p n — 1 goes to infinity as n gets large. Hence the 
probability of having such s sequence gets arbitrarily small 
when n gets large. ■ 

Lemma 10. For any sequence p n such that lim n _>. co ^— p n = 
00 and any fixed a > 



lim 



Vi > 



Proof: Let 
Gr(ti) = 



min X (S l - 1 , Sj ,p, i ) <2" 4(1+Q) 



= 1. 



Observe that Lemma [9] implies for any 5 > there exist a no 
such that P [QR{n)\ > 1-5/2 for n > n . 
Let 

,(S) 



= { 



-(ll+8a) 



1 



-(ll+8a) 



Likewise, the convergence of Z process implies that there exist 
a rti such that for any n > n\ P [£/^(n)] > 1 — 5/2. 

Now ( 118al i and (I18bl i imply that for S e <?.R(n) n 
Gz(n), > f, either X (S*" 1 , +,£„) < 2~ 4 ( 1+Q ) or 



< 2 4 ( 1 + Q ). For n > max{no,ni}, 



P [£i?( rl ) H <?z(n)] > 1 — <5 which proves the claim. ■ 

Lemma 11. Fpc a > and let m n = 41og(2(l + a)n — l) 
( as in Lemma \7}. Then: 



lim 



max p( s '*> < 2-( 1+ «)" 

t^S:|CP[S,t]|<m„ 



= 1 



Proof: For any p, let us define the random variable 
B n ,p — 1 [S n = argmin s x (S n_1 , ■ It is easy to see 
that P [B n , p = l|JF n _!] = P [S„, p = 0| J-„_i] = |. 
Fix £ > and let 



G B (n,p,e) 



Observe that P [QB{n,p, e)} is independent of p and by the 
Weak Law of Large Numbers for any <5 > there exist a tiq 
such that P [Q B {n,p, e)] > 1 - 5/2 for n > no- 
Fix a' > and define 

G x (n) = U > \ : mi nX (S*-\ Si ,m„) < 2 - 4 ( 1+Q '>} 




Observe that lim„_ 



.00 2 



00 according to definition. 



Therefore, in view of Lemma [TO] there exist n\ such that 
P [0 x ( n )} > 1 — 5/2 for n > n x . 

For n > max{n ,ni}, P[GB{n 7 m n ,e) C\Q x (n)] >l — 6 
and for S™ € QB(n,m n ,e) D G x ( n ) an d an Y t n 7^ S n such 
that I CP [S n ,t"]| < m„ we have: 



log (p! 8 "^: 



< l0 § ( Pn/2 ' 



i=n/2+l 
n 1 — £ 



)) 



(*) 
< - 



y 2 14(1 + a ') = -n(l -£)(1 + a'). 

In the above, (*) follows from Property [1] and observing that if 
B i>mn = I then xiS*- 1 , S h m n ) < 2- 4 ( 1+Q ') (as S e G x {n)), 
otherwise Si, m n ) < 1. For S e ^s(n, m n ,e), 

Bi,m n ) ^ < i < n will be one at least §^3^ times. 

Choosing a' and £ such that(l — e)(l+a') > (1+cc) proves 
the claim. ■ 



Theorem 1. For any a > 0. 

lim 



max/^ 3 ^ < 2" 



= 1. 



(20) 



Proof: The proof follows by combining the results of 
Lemma [7] and Lemma QT| ■ 

VI. Lower Bound on Probability of Error of 
Polar Codes 

In this section, we use our results on correlations among po- 
larized BECs to give lower-bounds on block error probability 
of Polar Codes over BEC. Recall the analysis of error of the 
code: The error event £ is the union of error events in each of 
information channels: £ = {J se _^£ s where A C { — ,+}" is 
the set of information bits and £ B denotes the error in W n s ^ . 

For a BEC — with a pessimistic assumption on decoder — 
a decision error happens exactly when an erasure happens. 

3 A practical decoder can break the ties randomly which increases the chance 
of decoding the bit correctly to i. An analysis analogous to the one we do 
in this section applies to such a decoder. 



Therefore, £ s = {E n s) = l} and the union bound gives us: 



s) 



(21) 



A trivial lower-bound on the probability of decoding error 
is obtained by observing that £ D £ s , hence, P [£} > P [£ s ] 
for any s e A In particular, 



P [£] > maxP [£ s ] = max Z n s 
seA seA 



(22) 



However, having the second order statistics, one can use the 
inclusion-exclusion principle to obtain a much tighter lower- 
bound on probability of error. 

Lemma 12. Let W be a BEC (e) and C n be a polar code 
of block-length N — 2" with information bits A n - The block 
error probability of such a code, P e (C„) is lower-bounded as: 



P e (C n ) > £ Z« - \ £ 



s£A„ 



s.teA„ 



Z (s) Z (t) 



7 (s) 7 {s) 
An A n 



7 (t) 7 (t) 
An A n 



(23) 



where Z„ vector and p n matrix can be computed via single- 
step recursions explained in Section \Hl\ 

Proof: The result follows by applying the inclusion- 
exclusion principle to lower-bound the probability of 



u 



While the lower-bound given by Lemma[T2lis already useful 
in practice (see Section IVIH . we seek for a lower-bound that 
is theoretically more significant. 

Theorem 2. Let W be a BEC (e) and R < 1 — e. Let C n be 
a polar code of block length N = 2™ with information bits 
A n such that \ A n \ — \NR~\. Let P(N, R, e) be defined as the 
sum of \NR~\ smallest elements of the vector Z„. Then, for 
any fixed 5 > and sufficiently large n: 

(l-S)P (N, (1 - 5)R, e) < P e {C n ) < P(N, R, e). 

Proof: The upper-bound is already known and we only 
need to prove the lower-bound. Let 

V n = js G {-,+}" : ™£Pn' t} < <52" n | 



By Theorem Q] we know that lim ra . 
the polar code defined by the information bits A' n 



and SL 



(s) 

Zn . It is clear that lim 



%i = 1. Let, C' n be 

A' 



A, 



= 1, 



S' n < P(N, R, e) (as A n contains |~TVi?] smallest elements of 
Z„), and P e (C' n ) < P e (C n ) as C' n is a sub-code of C„. 

A' 

Choose n large enough such that —^-r > 1 — 6 and 
P(N, R.e) < 5 (note that this is possible since R < 1 — e 
and results of [2| suggest that P(N, R,e)=0 (2"^)). By 



S' n -P e (C n )<S' n -P e (C' n ) 
1 



< 



2 r 



Observe that /9n ^ < J/iV for all s, t in the above sum- 



rn 



7 (s) 7 (s) 
A n An 



7 (t) „(t) 
A n A n 



mation, Es.te^^t < E s ,teA' n WW 

and 



r(s) 7 (t) 



E 

s,t£.A' is^t 



7 (s) 7 (t) A / 7 (t) 7 (t) 
■^n V An A n 



< e y/^W^< E V^V^ 



(*) 



E v^ s) < ki E 4 s) <^ 

where (*) follows by Cauchy-Schwarz inequality 0. 
Therefore, 

1 



S' n -Pe(Cn)< 



where the last inequality follows by observing that S' n < 
P{N, R, e) < S. As a result, 

(1 - S)S' n < P e (C n ) 

C' n is a code of rate R' > (1 — S)R and by definition S' n > 
P(N, R', e) > P (N, (1 - 5)R, e). Hence we can lower-bound 
the LHS of above by substituting S' n with P (N, (1 — S)R, e) 
which completes the proof. ■ 

VII. Numerical Results 

In this section we provide a numerical example which con- 
firms our theoretical results. We have considered Polar Codes 
of different rates on a BEC (0.5) and computed the upper- 
bound of ( 1211 , the trivial lower-bound of (l22~t and the tighter 
lower-bound of (l23l . We emphasize that we have exactly 
computed the lower-bound on error probability by computing 
the correlation coefficients. We did the computations for block 
lengths of TV = 4096 (n = 12) and N = 16384 (n = 14). 

As shown in Table [I] the proposed lower bound is much 
tighter than the trivial one. Moreover, the results show that 
the lower bound is very close to the upper bound of (|2TV 
This confirms that P(N, R, e) (as defined in Theorem is 
indeed a very good estimation for block error probability of 
Polar Codes over BEC. 



4 For any set of m numbers Xi,i 



1,2, 



, m, we have (X]^Li x i) 2 ^ 



R 


Upper-bound of j2 1 1 


Lower-bound of j221 


Lower-bound of J231 


0.2 


4.04 • 10" 18 


3.43 ■ 10" 19 


4.04 ■ 10" 18 


0.25 


1.87 ■ 10" 11 


9.25 • 10~ 13 


1.87 ■ 10" 11 


0.3 


5.4 ■ 10~ 7 


2.29 ■ 10" 8 


5.4 ■ 10~ 7 


0.35 


8.14 ■ 10~ 4 


2.11 • 10" 5 


8.12 ■ 10~ 4 


0.4 


0.17 


3.49 ■ 10" 3 


0.14 



(a) N = 4096 



R Upper-bound of j2 1 1 Lower-bound of J22I Lower-bound of J231 



0.2 


9.32 ■ 


10 -36 


4.72 


■ lO" 37 


9.32 ■ 


10 -36 


0.25 


1.32 ■ 


1Q -22 


3.54 


■ io- 24 


1.32 ■ 


1Q -22 


0.3 


2.32 ■ 


lO" 13 


5.4- 


lO" 15 


2.32 ■ 


IO" 13 


0.35 


2.63 


■lO" 7 


3.61 


■ 10~ 9 


2.63 


■ io- 7 


0.4 


5.47 


• 10" 3 


4.91 


■ 10" 5 


5.43 


■ 10" 3 



(b) N = 16384 



TABLE I: Bounds on Block Error Probability of Polar Code on B EC (0.5) 
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